Overview

Brought to you by YData

Dataset statistics

Number of variables19
Number of observations9798
Missing cells11439
Missing cells (%)6.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.4 MiB
Average record size in memory152.0 B

Variable types

Text9
Numeric6
DateTime1
Categorical3

Alerts

final_budget is highly overall correlated with final_domestic_boxoffice and 1 other fieldsHigh correlation
final_domestic_boxoffice is highly overall correlated with final_budget and 1 other fieldsHigh correlation
final_worldwide_boxoffice is highly overall correlated with final_budget and 1 other fieldsHigh correlation
certificate is highly imbalanced (50.0%) Imbalance
enrichment_source is highly imbalanced (83.2%) Imbalance
imdb_id has 845 (8.6%) missing values Missing
production_companies has 1408 (14.4%) missing values Missing
release_date has 149 (1.5%) missing values Missing
final_year has 149 (1.5%) missing values Missing
director has 827 (8.4%) missing values Missing
star has 849 (8.7%) missing values Missing
certificate has 1587 (16.2%) missing values Missing
rating has 983 (10.0%) missing values Missing
runtime has 903 (9.2%) missing values Missing
genres has 907 (9.3%) missing values Missing
production_countries has 1001 (10.2%) missing values Missing
original_language has 965 (9.8%) missing values Missing
enrichment_source has 866 (8.8%) missing values Missing
final_worldwide_boxoffice has 428 (4.4%) zeros Zeros
final_domestic_boxoffice has 752 (7.7%) zeros Zeros

Reproduction

Analysis started2025-03-31 19:57:07.140236
Analysis finished2025-03-31 19:57:11.136089
Duration4 seconds
Software versionydata-profiling vv4.15.1
Download configurationconfig.json

Variables

Distinct7871
Distinct (%)80.3%
Missing0
Missing (%)0.0%
Memory size76.7 KiB
2025-03-31T15:57:11.440020image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length104
Median length50
Mean length15.194427
Min length1

Characters and Unicode

Total characters148875
Distinct characters492
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6054 ?
Unique (%)61.8%

Sample

1st row#Horror
2nd row(500) Days of Summer
3rd row10,000 B.C.
4th row10,000 BC
5th row101 Dalmatians
ValueCountFrequency (%)
the 2934
 
10.9%
of 857
 
3.2%
a 334
 
1.2%
and 269
 
1.0%
in 240
 
0.9%
2 214
 
0.8%
to 187
 
0.7%
150
 
0.6%
man 126
 
0.5%
movie 92
 
0.3%
Other values (7165) 21467
79.9%
2025-03-31T15:57:11.863085image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
17073
 
11.5%
e 15136
 
10.2%
a 9478
 
6.4%
o 8849
 
5.9%
n 8111
 
5.4%
r 7992
 
5.4%
i 7675
 
5.2%
t 7307
 
4.9%
s 5850
 
3.9%
h 5578
 
3.7%
Other values (482) 55826
37.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 148875
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
17073
 
11.5%
e 15136
 
10.2%
a 9478
 
6.4%
o 8849
 
5.9%
n 8111
 
5.4%
r 7992
 
5.4%
i 7675
 
5.2%
t 7307
 
4.9%
s 5850
 
3.9%
h 5578
 
3.7%
Other values (482) 55826
37.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 148875
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
17073
 
11.5%
e 15136
 
10.2%
a 9478
 
6.4%
o 8849
 
5.9%
n 8111
 
5.4%
r 7992
 
5.4%
i 7675
 
5.2%
t 7307
 
4.9%
s 5850
 
3.9%
h 5578
 
3.7%
Other values (482) 55826
37.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 148875
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
17073
 
11.5%
e 15136
 
10.2%
a 9478
 
6.4%
o 8849
 
5.9%
n 8111
 
5.4%
r 7992
 
5.4%
i 7675
 
5.2%
t 7307
 
4.9%
s 5850
 
3.9%
h 5578
 
3.7%
Other values (482) 55826
37.5%

final_budget
Real number (ℝ)

High correlation 

Distinct755
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33275943
Minimum1
Maximum5.332 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size76.7 KiB
2025-03-31T15:57:11.964702image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile500000
Q15000000
median17500000
Q340000000
95-th percentile1.3 × 108
Maximum5.332 × 108
Range5.332 × 108
Interquartile range (IQR)35000000

Descriptive statistics

Standard deviation44332470
Coefficient of variation (CV)1.3322679
Kurtosis9.9851176
Mean33275943
Median Absolute Deviation (MAD)14225000
Skewness2.6741561
Sum3.2603769 × 1011
Variance1.9653679 × 1015
MonotonicityNot monotonic
2025-03-31T15:57:12.056046image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20000000 357
 
3.6%
10000000 343
 
3.5%
30000000 308
 
3.1%
25000000 293
 
3.0%
15000000 292
 
3.0%
40000000 281
 
2.9%
5000000 261
 
2.7%
50000000 213
 
2.2%
35000000 212
 
2.2%
12000000 200
 
2.0%
Other values (745) 7038
71.8%
ValueCountFrequency (%)
1 4
< 0.1%
2 1
 
< 0.1%
3 1
 
< 0.1%
5 2
< 0.1%
6 2
< 0.1%
8 2
< 0.1%
10 1
 
< 0.1%
11 1
 
< 0.1%
12 1
 
< 0.1%
15 1
 
< 0.1%
ValueCountFrequency (%)
533200000 1
 
< 0.1%
400000000 3
< 0.1%
380000000 1
 
< 0.1%
379000000 1
 
< 0.1%
365000000 1
 
< 0.1%
340000000 1
 
< 0.1%
330400000 1
 
< 0.1%
300000000 5
0.1%
290000000 1
 
< 0.1%
280200000 1
 
< 0.1%

final_worldwide_boxoffice
Real number (ℝ)

High correlation  Zeros 

Distinct8943
Distinct (%)91.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean96943398
Minimum0
Maximum2.923706 × 109
Zeros428
Zeros (%)4.4%
Negative0
Negative (%)0.0%
Memory size76.7 KiB
2025-03-31T15:57:12.143123image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1432.9
Q15401410.8
median30065016
Q31.0232693 × 108
95-th percentile4.1592791 × 108
Maximum2.923706 × 109
Range2.923706 × 109
Interquartile range (IQR)96925519

Descriptive statistics

Standard deviation1.8599809 × 108
Coefficient of variation (CV)1.9186257
Kurtosis35.144501
Mean96943398
Median Absolute Deviation (MAD)29279883
Skewness4.6845087
Sum9.4985141 × 1011
Variance3.459529 × 1016
MonotonicityNot monotonic
2025-03-31T15:57:12.233703image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 428
 
4.4%
2000000 16
 
0.2%
11000000 15
 
0.2%
8000000 12
 
0.1%
10000000 11
 
0.1%
7000000 10
 
0.1%
9000000 9
 
0.1%
6000000 9
 
0.1%
4000000 8
 
0.1%
2500000 7
 
0.1%
Other values (8933) 9273
94.6%
ValueCountFrequency (%)
0 428
4.4%
1 3
 
< 0.1%
4 3
 
< 0.1%
5 1
 
< 0.1%
6 2
 
< 0.1%
11 2
 
< 0.1%
13 1
 
< 0.1%
14 1
 
< 0.1%
16 1
 
< 0.1%
17 1
 
< 0.1%
ValueCountFrequency (%)
2923706026 1
< 0.1%
2748242781 1
< 0.1%
2743577587 1
< 0.1%
2320250281 1
< 0.1%
2223048786 1
< 0.1%
2068223624 1
< 0.1%
2056046835 1
< 0.1%
2048359754 1
< 0.1%
1979091486 1
< 0.1%
1921206586 1
< 0.1%

final_domestic_boxoffice
Real number (ℝ)

High correlation  Zeros 

Distinct7744
Distinct (%)79.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43315548
Minimum0
Maximum9.3666222 × 108
Zeros752
Zeros (%)7.7%
Negative0
Negative (%)0.0%
Memory size76.7 KiB
2025-03-31T15:57:12.317761image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12005226.8
median17529688
Q353267000
95-th percentile1.7430967 × 108
Maximum9.3666222 × 108
Range9.3666222 × 108
Interquartile range (IQR)51261773

Descriptive statistics

Standard deviation71688773
Coefficient of variation (CV)1.6550356
Kurtosis24.247244
Mean43315548
Median Absolute Deviation (MAD)17370700
Skewness3.9285551
Sum4.2440574 × 1011
Variance5.1392802 × 1015
MonotonicityNot monotonic
2025-03-31T15:57:12.401392image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 752
 
7.7%
8000000 12
 
0.1%
7000000 11
 
0.1%
10000000 11
 
0.1%
2000000 9
 
0.1%
4360000 8
 
0.1%
4000000 7
 
0.1%
11000000 6
 
0.1%
25000000 6
 
0.1%
36000000 5
 
0.1%
Other values (7734) 8971
91.6%
ValueCountFrequency (%)
0 752
7.7%
30 1
 
< 0.1%
264 1
 
< 0.1%
310 1
 
< 0.1%
388 2
 
< 0.1%
401 1
 
< 0.1%
423 1
 
< 0.1%
527 1
 
< 0.1%
528 1
 
< 0.1%
673 1
 
< 0.1%
ValueCountFrequency (%)
936662225 2
< 0.1%
858373000 1
< 0.1%
814811535 1
< 0.1%
785221649 1
< 0.1%
749766139 1
< 0.1%
718732821 1
< 0.1%
700059566 1
< 0.1%
684075767 1
< 0.1%
678815482 1
< 0.1%
674460013 1
< 0.1%
Distinct7810
Distinct (%)79.7%
Missing0
Missing (%)0.0%
Memory size76.7 KiB
2025-03-31T15:57:12.604597image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length85
Median length45
Mean length13.451929
Min length1

Characters and Unicode

Total characters131802
Distinct characters459
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5934 ?
Unique (%)60.6%

Sample

1st row#horror
2nd row(500)daysofsummer
3rd row10,000b.c.
4th row10,000bc
5th row101dalmatians
ValueCountFrequency (%)
kingkong 5
 
0.1%
nightofthelivingdead 4
 
< 0.1%
shaft 4
 
< 0.1%
conanthebarbarian 4
 
< 0.1%
thealamo 4
 
< 0.1%
houseofwax 4
 
< 0.1%
cinderella 4
 
< 0.1%
thesignal 4
 
< 0.1%
robinhood 4
 
< 0.1%
ghostbusters 4
 
< 0.1%
Other values (7779) 9757
99.6%
2025-03-31T15:57:12.957484image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 15739
11.9%
a 10733
 
8.1%
t 10513
 
8.0%
o 9262
 
7.0%
r 8891
 
6.7%
n 8682
 
6.6%
i 8499
 
6.4%
s 7771
 
5.9%
h 6611
 
5.0%
l 6051
 
4.6%
Other values (449) 39050
29.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 131802
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 15739
11.9%
a 10733
 
8.1%
t 10513
 
8.0%
o 9262
 
7.0%
r 8891
 
6.7%
n 8682
 
6.6%
i 8499
 
6.4%
s 7771
 
5.9%
h 6611
 
5.0%
l 6051
 
4.6%
Other values (449) 39050
29.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 131802
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 15739
11.9%
a 10733
 
8.1%
t 10513
 
8.0%
o 9262
 
7.0%
r 8891
 
6.7%
n 8682
 
6.6%
i 8499
 
6.4%
s 7771
 
5.9%
h 6611
 
5.0%
l 6051
 
4.6%
Other values (449) 39050
29.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 131802
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 15739
11.9%
a 10733
 
8.1%
t 10513
 
8.0%
o 9262
 
7.0%
r 8891
 
6.7%
n 8682
 
6.6%
i 8499
 
6.4%
s 7771
 
5.9%
h 6611
 
5.0%
l 6051
 
4.6%
Other values (449) 39050
29.6%

imdb_id
Text

Missing 

Distinct7130
Distinct (%)79.6%
Missing845
Missing (%)8.6%
Memory size76.7 KiB
2025-03-31T15:57:13.230382image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length10
Median length9
Mean length9.0247962
Min length9

Characters and Unicode

Total characters80799
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5310 ?
Unique (%)59.3%

Sample

1st rowtt3526286
2nd rowtt1022603
3rd rowtt0443649
4th rowtt0443649
5th rowtt0115433
ValueCountFrequency (%)
tt1325004 3
 
< 0.1%
tt1073498 3
 
< 0.1%
tt3470600 3
 
< 0.1%
tt1318514 2
 
< 0.1%
tt1034331 2
 
< 0.1%
tt1821549 2
 
< 0.1%
tt1559547 2
 
< 0.1%
tt0141926 2
 
< 0.1%
tt0498381 2
 
< 0.1%
tt0388500 2
 
< 0.1%
Other values (7120) 8930
99.7%
2025-03-31T15:57:13.825896image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 17906
22.2%
0 11679
14.5%
1 8299
10.3%
2 6443
 
8.0%
4 5820
 
7.2%
3 5771
 
7.1%
8 5323
 
6.6%
6 5105
 
6.3%
9 4955
 
6.1%
7 4904
 
6.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 80799
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 17906
22.2%
0 11679
14.5%
1 8299
10.3%
2 6443
 
8.0%
4 5820
 
7.2%
3 5771
 
7.1%
8 5323
 
6.6%
6 5105
 
6.3%
9 4955
 
6.1%
7 4904
 
6.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 80799
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 17906
22.2%
0 11679
14.5%
1 8299
10.3%
2 6443
 
8.0%
4 5820
 
7.2%
3 5771
 
7.1%
8 5323
 
6.6%
6 5105
 
6.3%
9 4955
 
6.1%
7 4904
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 80799
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 17906
22.2%
0 11679
14.5%
1 8299
10.3%
2 6443
 
8.0%
4 5820
 
7.2%
3 5771
 
7.1%
8 5323
 
6.6%
6 5105
 
6.3%
9 4955
 
6.1%
7 4904
 
6.1%

production_companies
Text

Missing 

Distinct5976
Distinct (%)71.2%
Missing1408
Missing (%)14.4%
Memory size76.7 KiB
2025-03-31T15:57:14.094192image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length473
Median length220
Mean length67.089035
Min length1

Characters and Unicode

Total characters562877
Distinct characters128
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4319 ?
Unique (%)51.5%

Sample

1st rowCentropolis Entertainment, Legendary Pictures, The Department of Trade, Industry and Competition of South Africa, Moonlighting Films, Warner Bros. Pictures
2nd rowCentropolis Entertainment, Legendary Pictures, The Department of Trade, Industry and Competition of South Africa, Moonlighting Films, Warner Bros. Pictures
3rd rowWalt Disney Pictures, Cruella Productions, Kanzaman S.A.M.
4th rowBad Robot
5th rowBad Robot
ValueCountFrequency (%)
pictures 5884
 
8.5%
productions 4071
 
5.9%
entertainment 3461
 
5.0%
films 3449
 
5.0%
film 1353
 
2.0%
media 859
 
1.2%
fox 736
 
1.1%
the 699
 
1.0%
warner 691
 
1.0%
company 680
 
1.0%
Other values (7294) 47469
68.4%
2025-03-31T15:57:14.459694image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
60965
 
10.8%
e 41595
 
7.4%
i 41003
 
7.3%
n 38111
 
6.8%
t 37753
 
6.7%
r 34837
 
6.2%
o 29851
 
5.3%
a 29580
 
5.3%
s 26026
 
4.6%
, 21063
 
3.7%
Other values (118) 202093
35.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 562877
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
60965
 
10.8%
e 41595
 
7.4%
i 41003
 
7.3%
n 38111
 
6.8%
t 37753
 
6.7%
r 34837
 
6.2%
o 29851
 
5.3%
a 29580
 
5.3%
s 26026
 
4.6%
, 21063
 
3.7%
Other values (118) 202093
35.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 562877
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
60965
 
10.8%
e 41595
 
7.4%
i 41003
 
7.3%
n 38111
 
6.8%
t 37753
 
6.7%
r 34837
 
6.2%
o 29851
 
5.3%
a 29580
 
5.3%
s 26026
 
4.6%
, 21063
 
3.7%
Other values (118) 202093
35.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 562877
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
60965
 
10.8%
e 41595
 
7.4%
i 41003
 
7.3%
n 38111
 
6.8%
t 37753
 
6.7%
r 34837
 
6.2%
o 29851
 
5.3%
a 29580
 
5.3%
s 26026
 
4.6%
, 21063
 
3.7%
Other values (118) 202093
35.9%

release_date
Date

Missing 

Distinct5106
Distinct (%)52.9%
Missing149
Missing (%)1.5%
Memory size76.7 KiB
Minimum1915-02-08 00:00:00
Maximum2068-12-11 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-03-31T15:57:14.566232image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:14.669496image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

final_year
Real number (ℝ)

Missing 

Distinct144
Distinct (%)1.5%
Missing149
Missing (%)1.5%
Infinite0
Infinite (%)0.0%
Mean2005.1106
Minimum1915
Maximum2068
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size76.7 KiB
2025-03-31T15:57:14.755236image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum1915
5-th percentile1980
Q11999
median2007
Q32014
95-th percentile2021
Maximum2068
Range153
Interquartile range (IQR)15

Descriptive statistics

Standard deviation14.583461
Coefficient of variation (CV)0.0072731456
Kurtosis5.7825191
Mean2005.1106
Median Absolute Deviation (MAD)7
Skewness-0.81713557
Sum19347312
Variance212.67734
MonotonicityNot monotonic
2025-03-31T15:57:14.846692image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2015 451
 
4.6%
2011 405
 
4.1%
2016 401
 
4.1%
2010 400
 
4.1%
2014 395
 
4.0%
2006 386
 
3.9%
2008 383
 
3.9%
2013 379
 
3.9%
2012 364
 
3.7%
2009 359
 
3.7%
Other values (134) 5726
58.4%
ValueCountFrequency (%)
1915 2
 
< 0.1%
1916 1
 
< 0.1%
1921 1
 
< 0.1%
1922 1
 
< 0.1%
1924 1
 
< 0.1%
1925 5
0.1%
1927 2
 
< 0.1%
1928 3
< 0.1%
1929 1
 
< 0.1%
1930 1
 
< 0.1%
ValueCountFrequency (%)
2068 8
0.1%
2067 9
0.1%
2066 6
0.1%
2065 7
0.1%
2064 8
0.1%
2063 8
0.1%
2062 8
0.1%
2061 5
0.1%
2060 5
0.1%
2059 4
< 0.1%

director
Text

Missing 

Distinct3362
Distinct (%)37.5%
Missing827
Missing (%)8.4%
Memory size76.7 KiB
2025-03-31T15:57:15.182833image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length330
Median length152
Mean length14.209899
Min length3

Characters and Unicode

Total characters127477
Distinct characters106
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1889 ?
Unique (%)21.1%

Sample

1st rowTara Subkoff
2nd rowMarc Webb
3rd rowRoland Emmerich
4th rowRoland Emmerich
5th rowStephen Herek
ValueCountFrequency (%)
john 347
 
1.7%
david 297
 
1.5%
michael 242
 
1.2%
peter 173
 
0.9%
robert 166
 
0.8%
james 165
 
0.8%
paul 142
 
0.7%
scott 120
 
0.6%
richard 119
 
0.6%
lee 118
 
0.6%
Other values (4219) 18187
90.6%
2025-03-31T15:57:15.632278image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 11805
 
9.3%
11106
 
8.7%
a 10244
 
8.0%
n 8975
 
7.0%
r 8571
 
6.7%
o 7389
 
5.8%
i 7294
 
5.7%
l 5889
 
4.6%
t 4502
 
3.5%
s 4155
 
3.3%
Other values (96) 47547
37.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 127477
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 11805
 
9.3%
11106
 
8.7%
a 10244
 
8.0%
n 8975
 
7.0%
r 8571
 
6.7%
o 7389
 
5.8%
i 7294
 
5.7%
l 5889
 
4.6%
t 4502
 
3.5%
s 4155
 
3.3%
Other values (96) 47547
37.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 127477
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 11805
 
9.3%
11106
 
8.7%
a 10244
 
8.0%
n 8975
 
7.0%
r 8571
 
6.7%
o 7389
 
5.8%
i 7294
 
5.7%
l 5889
 
4.6%
t 4502
 
3.5%
s 4155
 
3.3%
Other values (96) 47547
37.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 127477
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 11805
 
9.3%
11106
 
8.7%
a 10244
 
8.0%
n 8975
 
7.0%
r 8571
 
6.7%
o 7389
 
5.8%
i 7294
 
5.7%
l 5889
 
4.6%
t 4502
 
3.5%
s 4155
 
3.3%
Other values (96) 47547
37.3%

star
Text

Missing 

Distinct7240
Distinct (%)80.9%
Missing849
Missing (%)8.7%
Memory size76.7 KiB
2025-03-31T15:57:15.879762image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length89
Median length79
Mean length57.603196
Min length9

Characters and Unicode

Total characters515491
Distinct characters130
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5592 ?
Unique (%)62.5%

Sample

1st rowSadie Seelert, Haley Murphy, Bridget McGarry
2nd rowZooey Deschanel, Joseph Gordon-Levitt, Geoffrey Arend
3rd rowSteven Strait, Camilla Belle, Cliff Curtis, Nathanael Baring
4th rowSteven Strait, Camilla Belle, Cliff Curtis, Nathanael Baring
5th rowGlenn Close, Jeff Daniels, Joely Richardson
ValueCountFrequency (%)
john 648
 
0.9%
michael 590
 
0.8%
james 460
 
0.6%
robert 399
 
0.5%
david 387
 
0.5%
tom 343
 
0.5%
jason 288
 
0.4%
chris 270
 
0.4%
kevin 265
 
0.4%
jennifer 248
 
0.3%
Other values (10885) 68883
94.6%
2025-03-31T15:57:16.252829image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
63835
 
12.4%
e 43742
 
8.5%
a 41963
 
8.1%
n 33949
 
6.6%
r 28763
 
5.6%
i 28537
 
5.5%
, 26322
 
5.1%
o 25626
 
5.0%
l 23476
 
4.6%
t 16459
 
3.2%
Other values (120) 182819
35.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 515491
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
63835
 
12.4%
e 43742
 
8.5%
a 41963
 
8.1%
n 33949
 
6.6%
r 28763
 
5.6%
i 28537
 
5.5%
, 26322
 
5.1%
o 25626
 
5.0%
l 23476
 
4.6%
t 16459
 
3.2%
Other values (120) 182819
35.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 515491
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
63835
 
12.4%
e 43742
 
8.5%
a 41963
 
8.1%
n 33949
 
6.6%
r 28763
 
5.6%
i 28537
 
5.5%
, 26322
 
5.1%
o 25626
 
5.0%
l 23476
 
4.6%
t 16459
 
3.2%
Other values (120) 182819
35.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 515491
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
63835
 
12.4%
e 43742
 
8.5%
a 41963
 
8.1%
n 33949
 
6.6%
r 28763
 
5.6%
i 28537
 
5.5%
, 26322
 
5.1%
o 25626
 
5.0%
l 23476
 
4.6%
t 16459
 
3.2%
Other values (120) 182819
35.5%

_merge
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.7 KiB
right_only
4578 
left_only
3198 
both
2022 

Length

Max length10
Median length9
Mean length8.435395
Min length4

Characters and Unicode

Total characters82650
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowboth
2nd rowboth
3rd rowright_only
4th rowleft_only
5th rowleft_only

Common Values

ValueCountFrequency (%)
right_only 4578
46.7%
left_only 3198
32.6%
both 2022
20.6%

Length

2025-03-31T15:57:16.359359image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-31T15:57:16.488107image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
right_only 4578
46.7%
left_only 3198
32.6%
both 2022
20.6%

Most occurring characters

ValueCountFrequency (%)
l 10974
13.3%
t 9798
11.9%
o 9798
11.9%
_ 7776
9.4%
n 7776
9.4%
y 7776
9.4%
h 6600
8.0%
r 4578
5.5%
i 4578
5.5%
g 4578
5.5%
Other values (3) 8418
10.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 82650
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 10974
13.3%
t 9798
11.9%
o 9798
11.9%
_ 7776
9.4%
n 7776
9.4%
y 7776
9.4%
h 6600
8.0%
r 4578
5.5%
i 4578
5.5%
g 4578
5.5%
Other values (3) 8418
10.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 82650
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 10974
13.3%
t 9798
11.9%
o 9798
11.9%
_ 7776
9.4%
n 7776
9.4%
y 7776
9.4%
h 6600
8.0%
r 4578
5.5%
i 4578
5.5%
g 4578
5.5%
Other values (3) 8418
10.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 82650
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 10974
13.3%
t 9798
11.9%
o 9798
11.9%
_ 7776
9.4%
n 7776
9.4%
y 7776
9.4%
h 6600
8.0%
r 4578
5.5%
i 4578
5.5%
g 4578
5.5%
Other values (3) 8418
10.2%

certificate
Categorical

Imbalance  Missing 

Distinct13
Distinct (%)0.2%
Missing1587
Missing (%)16.2%
Memory size76.7 KiB
R
3607 
PG-13
2696 
PG
1351 
G
 
249
NR
 
249
Other values (8)
 
59

Length

Max length9
Median length8
Mean length2.549385
Min length1

Characters and Unicode

Total characters20933
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowNot Rated
2nd rowPG-13
3rd rowPG-13
4th rowPG-13
5th rowG

Common Values

ValueCountFrequency (%)
R 3607
36.8%
PG-13 2696
27.5%
PG 1351
 
13.8%
G 249
 
2.5%
NR 249
 
2.5%
NC-17 27
 
0.3%
Not Rated 20
 
0.2%
Approved 5
 
0.1%
Unrated 2
 
< 0.1%
PG-13 2
 
< 0.1%
Other values (3) 3
 
< 0.1%
(Missing) 1587
16.2%

Length

2025-03-31T15:57:16.585639image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
r 3607
43.8%
pg-13 2698
32.8%
pg 1351
 
16.4%
g 249
 
3.0%
nr 249
 
3.0%
nc-17 27
 
0.3%
not 20
 
0.2%
rated 20
 
0.2%
approved 5
 
0.1%
unrated 2
 
< 0.1%
Other values (3) 3
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
G 4298
20.5%
P 4050
19.3%
R 3876
18.5%
- 2727
13.0%
1 2726
13.0%
3 2698
12.9%
N 296
 
1.4%
t 42
 
0.2%
d 28
 
0.1%
e 28
 
0.1%
Other values (16) 164
 
0.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 20933
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
G 4298
20.5%
P 4050
19.3%
R 3876
18.5%
- 2727
13.0%
1 2726
13.0%
3 2698
12.9%
N 296
 
1.4%
t 42
 
0.2%
d 28
 
0.1%
e 28
 
0.1%
Other values (16) 164
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 20933
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
G 4298
20.5%
P 4050
19.3%
R 3876
18.5%
- 2727
13.0%
1 2726
13.0%
3 2698
12.9%
N 296
 
1.4%
t 42
 
0.2%
d 28
 
0.1%
e 28
 
0.1%
Other values (16) 164
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 20933
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
G 4298
20.5%
P 4050
19.3%
R 3876
18.5%
- 2727
13.0%
1 2726
13.0%
3 2698
12.9%
N 296
 
1.4%
t 42
 
0.2%
d 28
 
0.1%
e 28
 
0.1%
Other values (16) 164
 
0.8%

rating
Real number (ℝ)

Missing 

Distinct1723
Distinct (%)19.5%
Missing983
Missing (%)10.0%
Infinite0
Infinite (%)0.0%
Mean6.4313958
Minimum0.5
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size76.7 KiB
2025-03-31T15:57:16.689162image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0.5
5-th percentile4.9
Q15.9
median6.5
Q37.0015
95-th percentile7.7986
Maximum10
Range9.5
Interquartile range (IQR)1.1015

Descriptive statistics

Standard deviation0.89636096
Coefficient of variation (CV)0.13937269
Kurtosis2.1521949
Mean6.4313958
Median Absolute Deviation (MAD)0.546
Skewness-0.63388987
Sum56692.754
Variance0.80346297
MonotonicityNot monotonic
2025-03-31T15:57:16.790718image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.5 290
 
3.0%
6.2 266
 
2.7%
6.6 262
 
2.7%
6 262
 
2.7%
6.3 261
 
2.7%
6.4 255
 
2.6%
6.7 245
 
2.5%
6.1 244
 
2.5%
6.9 215
 
2.2%
5.8 211
 
2.2%
Other values (1713) 6304
64.3%
(Missing) 983
 
10.0%
ValueCountFrequency (%)
0.5 2
< 0.1%
1 2
< 0.1%
1.2 1
 
< 0.1%
1.5 1
 
< 0.1%
1.9 1
 
< 0.1%
2 4
< 0.1%
2.056 1
 
< 0.1%
2.4 2
< 0.1%
2.5 4
< 0.1%
2.6 1
 
< 0.1%
ValueCountFrequency (%)
10 10
0.1%
9.8 1
 
< 0.1%
9 1
 
< 0.1%
8.708 1
 
< 0.1%
8.7 2
 
< 0.1%
8.6 2
 
< 0.1%
8.566 2
 
< 0.1%
8.549 1
 
< 0.1%
8.538 2
 
< 0.1%
8.519 2
 
< 0.1%

runtime
Real number (ℝ)

Missing 

Distinct199
Distinct (%)2.2%
Missing903
Missing (%)9.2%
Infinite0
Infinite (%)0.0%
Mean108.1765
Minimum2
Maximum339
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size76.7 KiB
2025-03-31T15:57:16.890804image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile83
Q195
median105
Q3119
95-th percentile145
Maximum339
Range337
Interquartile range (IQR)24

Descriptive statistics

Standard deviation22.155153
Coefficient of variation (CV)0.20480559
Kurtosis5.9755883
Mean108.1765
Median Absolute Deviation (MAD)12
Skewness0.73393337
Sum962230
Variance490.85079
MonotonicityNot monotonic
2025-03-31T15:57:16.996335image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
98 246
 
2.5%
90 245
 
2.5%
100 241
 
2.5%
95 234
 
2.4%
97 229
 
2.3%
104 215
 
2.2%
107 213
 
2.2%
101 213
 
2.2%
105 211
 
2.2%
99 203
 
2.1%
Other values (189) 6645
67.8%
(Missing) 903
 
9.2%
ValueCountFrequency (%)
2 2
 
< 0.1%
3 3
< 0.1%
4 1
 
< 0.1%
5 5
0.1%
6 2
 
< 0.1%
7 4
< 0.1%
8 4
< 0.1%
9 1
 
< 0.1%
10 3
< 0.1%
11 1
 
< 0.1%
ValueCountFrequency (%)
339 1
< 0.1%
266 1
< 0.1%
254 1
< 0.1%
251 1
< 0.1%
242 1
< 0.1%
240 1
< 0.1%
233 1
< 0.1%
229 1
< 0.1%
228 1
< 0.1%
225 1
< 0.1%

genres
Text

Missing 

Distinct1703
Distinct (%)19.2%
Missing907
Missing (%)9.3%
Memory size76.7 KiB
2025-03-31T15:57:17.185286image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length73
Median length61
Mean length21.511978
Min length5

Characters and Unicode

Total characters191263
Distinct characters33
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique812 ?
Unique (%)9.1%

Sample

1st rowCrime, Drama, Horror
2nd rowComedy, Drama, Romance
3rd rowAdventure, Action, Drama, Fantasy
4th rowAdventure, Action, Drama, Fantasy
5th rowAdventure, Comedy, Crime
ValueCountFrequency (%)
drama 4129
16.6%
comedy 3148
12.7%
action 2338
9.4%
thriller 2331
9.4%
adventure 1778
 
7.2%
romance 1665
 
6.7%
crime 1449
 
5.8%
horror 1056
 
4.3%
science 1044
 
4.2%
fiction 1044
 
4.2%
Other values (16) 4820
19.4%
2025-03-31T15:57:17.444219image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
r 17127
 
9.0%
15911
 
8.3%
e 15473
 
8.1%
, 14839
 
7.8%
a 13902
 
7.3%
i 12259
 
6.4%
m 12101
 
6.3%
o 11598
 
6.1%
n 10230
 
5.3%
t 8238
 
4.3%
Other values (23) 59585
31.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 191263
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 17127
 
9.0%
15911
 
8.3%
e 15473
 
8.1%
, 14839
 
7.8%
a 13902
 
7.3%
i 12259
 
6.4%
m 12101
 
6.3%
o 11598
 
6.1%
n 10230
 
5.3%
t 8238
 
4.3%
Other values (23) 59585
31.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 191263
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 17127
 
9.0%
15911
 
8.3%
e 15473
 
8.1%
, 14839
 
7.8%
a 13902
 
7.3%
i 12259
 
6.4%
m 12101
 
6.3%
o 11598
 
6.1%
n 10230
 
5.3%
t 8238
 
4.3%
Other values (23) 59585
31.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 191263
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 17127
 
9.0%
15911
 
8.3%
e 15473
 
8.1%
, 14839
 
7.8%
a 13902
 
7.3%
i 12259
 
6.4%
m 12101
 
6.3%
o 11598
 
6.1%
n 10230
 
5.3%
t 8238
 
4.3%
Other values (23) 59585
31.2%

production_countries
Text

Missing 

Distinct775
Distinct (%)8.8%
Missing1001
Missing (%)10.2%
Memory size76.7 KiB
2025-03-31T15:57:17.608452image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length111
Median length24
Mean length26.902694
Min length4

Characters and Unicode

Total characters236663
Distinct characters50
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique458 ?
Unique (%)5.2%

Sample

1st rowUnited States
2nd rowUnited States
3rd rowUnited States of America, South Africa, New Zealand
4th rowUnited States of America, South Africa, New Zealand
5th rowUnited States, United Kingdom
ValueCountFrequency (%)
united 9086
24.4%
states 7670
20.6%
of 7314
19.6%
america 7314
19.6%
kingdom 1391
 
3.7%
france 688
 
1.8%
germany 565
 
1.5%
canada 507
 
1.4%
japan 190
 
0.5%
australia 189
 
0.5%
Other values (108) 2332
 
6.3%
2025-03-31T15:57:17.881568image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
28449
12.0%
e 26361
11.1%
t 25046
 
10.6%
a 20214
 
8.5%
i 19188
 
8.1%
n 13814
 
5.8%
d 11548
 
4.9%
m 9527
 
4.0%
r 9380
 
4.0%
o 9372
 
4.0%
Other values (40) 63764
26.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 236663
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
28449
12.0%
e 26361
11.1%
t 25046
 
10.6%
a 20214
 
8.5%
i 19188
 
8.1%
n 13814
 
5.8%
d 11548
 
4.9%
m 9527
 
4.0%
r 9380
 
4.0%
o 9372
 
4.0%
Other values (40) 63764
26.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 236663
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
28449
12.0%
e 26361
11.1%
t 25046
 
10.6%
a 20214
 
8.5%
i 19188
 
8.1%
n 13814
 
5.8%
d 11548
 
4.9%
m 9527
 
4.0%
r 9380
 
4.0%
o 9372
 
4.0%
Other values (40) 63764
26.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 236663
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
28449
12.0%
e 26361
11.1%
t 25046
 
10.6%
a 20214
 
8.5%
i 19188
 
8.1%
n 13814
 
5.8%
d 11548
 
4.9%
m 9527
 
4.0%
r 9380
 
4.0%
o 9372
 
4.0%
Other values (40) 63764
26.9%

original_language
Text

Missing 

Distinct165
Distinct (%)1.9%
Missing965
Missing (%)9.8%
Memory size76.7 KiB
2025-03-31T15:57:18.063935image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length87
Median length7
Mean length7.2580097
Min length3

Characters and Unicode

Total characters64110
Distinct characters52
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique92 ?
Unique (%)1.0%

Sample

1st rowEnglish
2nd rowEnglish, French, Swedish
3rd rowEnglish
4th rowEnglish
5th rowEnglish, Spanish
ValueCountFrequency (%)
english 7412
81.0%
french 282
 
3.1%
spanish 237
 
2.6%
german 177
 
1.9%
mandarin 109
 
1.2%
japanese 105
 
1.1%
arabic 84
 
0.9%
hindi 83
 
0.9%
russian 72
 
0.8%
italian 65
 
0.7%
Other values (78) 521
 
5.7%
2025-03-31T15:57:18.350482image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 9067
14.1%
i 8441
13.2%
s 8120
12.7%
h 8069
12.6%
l 7553
11.8%
g 7543
11.8%
E 7416
11.6%
a 1537
 
2.4%
e 1091
 
1.7%
r 831
 
1.3%
Other values (42) 4442
6.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 64110
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 9067
14.1%
i 8441
13.2%
s 8120
12.7%
h 8069
12.6%
l 7553
11.8%
g 7543
11.8%
E 7416
11.6%
a 1537
 
2.4%
e 1091
 
1.7%
r 831
 
1.3%
Other values (42) 4442
6.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 64110
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 9067
14.1%
i 8441
13.2%
s 8120
12.7%
h 8069
12.6%
l 7553
11.8%
g 7543
11.8%
E 7416
11.6%
a 1537
 
2.4%
e 1091
 
1.7%
r 831
 
1.3%
Other values (42) 4442
6.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 64110
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 9067
14.1%
i 8441
13.2%
s 8120
12.7%
h 8069
12.6%
l 7553
11.8%
g 7543
11.8%
E 7416
11.6%
a 1537
 
2.4%
e 1091
 
1.7%
r 831
 
1.3%
Other values (42) 4442
6.9%

enrichment_source
Categorical

Imbalance  Missing 

Distinct3
Distinct (%)< 0.1%
Missing866
Missing (%)8.8%
Memory size76.7 KiB
tmdb
8540 
omdb_id
 
383
omdb_title
 
9

Length

Max length10
Median length4
Mean length4.1346843
Min length4

Characters and Unicode

Total characters36931
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowomdb_id
2nd rowomdb_id
3rd rowtmdb
4th rowtmdb
5th rowomdb_id

Common Values

ValueCountFrequency (%)
tmdb 8540
87.2%
omdb_id 383
 
3.9%
omdb_title 9
 
0.1%
(Missing) 866
 
8.8%

Length

2025-03-31T15:57:18.438621image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-03-31T15:57:18.508419image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
tmdb 8540
95.6%
omdb_id 383
 
4.3%
omdb_title 9
 
0.1%

Most occurring characters

ValueCountFrequency (%)
d 9315
25.2%
m 8932
24.2%
b 8932
24.2%
t 8558
23.2%
o 392
 
1.1%
_ 392
 
1.1%
i 392
 
1.1%
l 9
 
< 0.1%
e 9
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 36931
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
d 9315
25.2%
m 8932
24.2%
b 8932
24.2%
t 8558
23.2%
o 392
 
1.1%
_ 392
 
1.1%
i 392
 
1.1%
l 9
 
< 0.1%
e 9
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 36931
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
d 9315
25.2%
m 8932
24.2%
b 8932
24.2%
t 8558
23.2%
o 392
 
1.1%
_ 392
 
1.1%
i 392
 
1.1%
l 9
 
< 0.1%
e 9
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 36931
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
d 9315
25.2%
m 8932
24.2%
b 8932
24.2%
t 8558
23.2%
o 392
 
1.1%
_ 392
 
1.1%
i 392
 
1.1%
l 9
 
< 0.1%
e 9
 
< 0.1%

Interactions

2025-03-31T15:57:10.197872image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.026265image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.430092image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.835303image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.429852image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.828368image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:10.270757image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.105048image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.494630image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.895643image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.497492image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.897029image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:10.338043image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.168952image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.559458image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.960370image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.564480image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.958783image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:10.401018image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.230178image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.627417image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.029567image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.627347image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:10.017059image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:10.467782image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.297826image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.700721image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.307981image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.699372image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:10.077859image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:10.548481image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.359943image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:08.764353image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.364955image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:09.761245image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2025-03-31T15:57:10.134231image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Correlations

2025-03-31T15:57:18.569205image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
_mergecertificateenrichment_sourcefinal_budgetfinal_domestic_boxofficefinal_worldwide_boxofficefinal_yearratingruntime
_merge1.0000.0750.1300.0460.0180.0280.2610.0690.072
certificate0.0751.0000.3480.1080.0750.0750.2480.0400.094
enrichment_source0.1300.3481.0000.0000.0000.0000.0350.0200.033
final_budget0.0460.1080.0001.0000.6680.7240.1060.0570.321
final_domestic_boxoffice0.0180.0750.0000.6681.0000.926-0.0930.2560.252
final_worldwide_boxoffice0.0280.0750.0000.7240.9261.0000.0190.2810.294
final_year0.2610.2480.0350.106-0.0930.0191.000-0.0110.005
rating0.0690.0400.0200.0570.2560.281-0.0111.0000.394
runtime0.0720.0940.0330.3210.2520.2940.0050.3941.000

Missing values

2025-03-31T15:57:10.661114image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-03-31T15:57:10.841062image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-03-31T15:57:11.011091image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

final_titlefinal_budgetfinal_worldwide_boxofficefinal_domestic_boxofficefinal_clean_titleimdb_idproduction_companiesrelease_datefinal_yeardirectorstar_mergecertificateratingruntimegenresproduction_countriesoriginal_languageenrichment_source
0#Horror1500000.00.00.0#horrortt3526286NaN2015-11-202015.0Tara SubkoffSadie Seelert, Haley Murphy, Bridget McGarrybothNot Rated3.10097.0Crime, Drama, HorrorUnited StatesEnglishomdb_id
1(500) Days of Summer7500000.034515303.032425665.0(500)daysofsummertt1022603NaN2009-07-172009.0Marc WebbZooey Deschanel, Joseph Gordon-Levitt, Geoffrey ArendbothPG-137.70095.0Comedy, Drama, RomanceUnited StatesEnglish, French, Swedishomdb_id
210,000 B.C.105000000.0269065678.094784201.010,000b.c.tt0443649Centropolis Entertainment, Legendary Pictures, The Department of Trade, Industry and Competition of South Africa, Moonlighting Films, Warner Bros. Pictures2008-03-072008.0Roland EmmerichSteven Strait, Camilla Belle, Cliff Curtis, Nathanael Baringright_onlyPG-135.500109.0Adventure, Action, Drama, FantasyUnited States of America, South Africa, New ZealandEnglishtmdb
310,000 BC105000000.0269784201.094784201.010,000bctt0443649Centropolis Entertainment, Legendary Pictures, The Department of Trade, Industry and Competition of South Africa, Moonlighting Films, Warner Bros. Pictures2008-02-222008.0Roland EmmerichSteven Strait, Camilla Belle, Cliff Curtis, Nathanael Baringleft_onlyPG-135.500109.0Adventure, Action, Drama, FantasyUnited States of America, South Africa, New ZealandEnglishtmdb
4101 Dalmatians54000000.0320689294.0136189294.0101dalmatianstt0115433NaN1996-11-171996.0Stephen HerekGlenn Close, Jeff Daniels, Joely Richardsonleft_onlyG5.700103.0Adventure, Comedy, CrimeUnited States, United KingdomEnglish, Spanishomdb_id
5102 Dalmatians85000000.0183611771.066957026.0102dalmatianstt0211181NaN2000-10-072000.0Kevin LimaGlenn Close, Gérard Depardieu, Ioan Gruffuddleft_onlyG4.900100.0Adventure, Comedy, FamilyUnited States, United Kingdom, MonacoEnglishomdb_id
6102 Dalmatians85000000.066941559.066941559.0102dalmatianstt0211181Walt Disney Pictures, Cruella Productions, Kanzaman S.A.M.2000-11-222000.0Kevin LimaGlenn Close, Gérard Depardieu, Ioan Gruffudd, Alice Evansright_onlyG5.500100.0Family, ComedyUnited States of AmericaEnglishtmdb
710 Cloverfield Lane15000000.0108286422.072082999.010cloverfieldlanett1179933Bad Robot2016-01-042016.0Dan TrachtenbergJohn Goodman, Mary Elizabeth Winstead, John Gallagher Jr., Douglas M. Griffinright_onlyPG-136.993104.0Thriller, Science Fiction, Drama, HorrorUnited States of AmericaEnglishtmdb
810 Cloverfield Lane15000000.0110216998.072082998.010cloverfieldlanett1179933Bad Robot2016-03-102016.0Dan TrachtenbergJohn Goodman, Mary Elizabeth Winstead, John Gallagher Jr., Douglas M. Griffinleft_onlyPG-136.993104.0Thriller, Science Fiction, Drama, HorrorUnited States of AmericaEnglishtmdb
910 Days in a Madhouse12000000.014616.014616.010daysinamadhousett3453052NaN2015-11-112015.0Timothy HinesCaroline Barry, Christopher Lambert, Kelly LeBrock, Julia Chantreyright_onlyNaN4.500111.0DramaUnited States of AmericaEnglishtmdb
final_titlefinal_budgetfinal_worldwide_boxofficefinal_domestic_boxofficefinal_clean_titleimdb_idproduction_companiesrelease_datefinal_yeardirectorstar_mergecertificateratingruntimegenresproduction_countriesoriginal_languageenrichment_source
9788마더5000000.017112713.0547292.0마더tt1216496Barunson E&A, CJ Entertainment2009-05-282009.0Bong Joon HoKim Hye-ja, Won Bin, Jin Goo, Yoon Je-moonleft_onlyR7.700129.0Crime, Drama, MysterySouth KoreaKoreantmdb
9789명량9500000.0112156811.02589811.0명량tt3541262Big Stone Pictures2014-07-302014.0Kim Han-minChoi Min-sik, Ryu Seung-ryong, Cho Jin-woong, Jin Gooleft_onlyNaN7.000126.0War, Action, Drama, HistorySouth KoreaJapanesetmdb
9790베를린9000000.048965210.0665210.0베를린tt2357377CJ Entertainment, Filmmaker R&K, Union Investment Partners2013-01-302013.0Ryoo Seung-wanHa Jung-woo, Han Suk-kyu, Ryoo Seung-bum, Jun Ji-hyunleft_onlyNaN6.700120.0Action, Thriller, CrimeSouth KoreaGermantmdb
9791복수는 나의 것4000000.01954937.045289.0복수는나의것tt0310775CJ Entertainment, Studio Box2002-03-292002.0Park Chan-wookSong Kang-ho, Shin Ha-kyun, Bae Doona, Im Ji-eunleft_onlyR7.464129.0Action, Drama, ThrillerSouth KoreaKoreantmdb
9792부산행8820000.02129768.02129768.0부산행tt5700672Next Entertainment World, RedPeter Films, Contents Panda, Union Investment Partners, KTB Network2016-07-202016.0Yeon Sang-hoGong Yoo, Kim Su-an, Jung Yu-mi, Don Leeleft_onlyNR7.750118.0Horror, Thriller, Action, AdventureSouth KoreaKoreantmdb
979350000.021075.021075.0tt0255589Myung Films, CJ Entertainment2000-04-222000.0Kim Ki-dukKim Yu-seok, Suh Jung, Seo Won, Son Min-seokleft_onlyNaN6.95690.0Drama, ThrillerSouth KoreaKoreantmdb
9794아가씨8575000.01983204.02006788.0아가씨tt4016934Moho Film, Yong Film, CJ Entertainment2016-06-012016.0Park Chan-wookKim Min-hee, Kim Tae-ri, Ha Jung-woo, Cho Jin-woongleft_onlyR8.200145.0Thriller, Drama, RomanceSouth KoreaJapanesetmdb
9795올드보이3000000.014980005.0707481.0올드보이tt0364569Show East, Egg Film, Cineclick Asia2003-01-012003.0Park Chan-wookChoi Min-sik, Yoo Ji-tae, Kang Hye-jung, Kim Byeong-okleft_onlyR8.251120.0Drama, Thriller, Mystery, ActionSouth KoreaKoreantmdb
9796최종병기 활8000000.049000000.0251200.0최종병기활tt2025526Lotte Entertainment, Dasepo Club, DCG Plus, Sovik Venture Capital2011-08-102011.0Kim Han-minPark Hae-il, Moon Chae-won, Kim Moo-yul, Ryu Seung-ryongleft_onlyNR7.200122.0Drama, Action, HistorySouth Korea, United States of AmericaKoreantmdb
9797피에타103000.03623330.021932.0피에타tt2299842Next Entertainment World, Kim Ki Duk Film, Finecut2012-09-052012.0Kim Ki-dukCho Min-soo, Lee Jung-jin, Woo Ki-hong, Kang Eun-jinleft_onlyNR7.100104.0DramaSouth KoreaKoreantmdb